Principal Component Analysis

Machine Learning at a Glance

Principal Components Analysis (PCA)

The Goals of PCA

Three-Dimensions

This idea extends to three-dimensional data. In the first example (shown below), a line (one dimension) can provide a good representation of the orginal data:

While this second example, a line provides a poor representation, but a plane (two dimensions) provides a good representation:

How are PC dimensions defined?

How are PC dimensions defined?

\[Z_1 = \phi_{11} X_1 + \phi_{21} X_2 + \ldots + \phi_{p1} X_p\] \[Z_2 = \phi_{12} X_1 + \phi_{22} X_2 + \ldots + \phi_{p2} X_p\]

Getting the loadings

Getting the loadings

data("USArrests")
X <- USArrests
X <- scale(X)
1/(nrow(X) - 1) * t(X) %*% X  ## Calculating it ourselves
##              Murder   Assault   UrbanPop      Rape
## Murder   1.00000000 0.8018733 0.06957262 0.5635788
## Assault  0.80187331 1.0000000 0.25887170 0.6652412
## UrbanPop 0.06957262 0.2588717 1.00000000 0.4113412
## Rape     0.56357883 0.6652412 0.41134124 1.0000000
cov(X) ## Using the built-in "cov" function
##              Murder   Assault   UrbanPop      Rape
## Murder   1.00000000 0.8018733 0.06957262 0.5635788
## Assault  0.80187331 1.0000000 0.25887170 0.6652412
## UrbanPop 0.06957262 0.2588717 1.00000000 0.4113412
## Rape     0.56357883 0.6652412 0.41134124 1.0000000

Getting the loadings

\[\mathbf{C} = \mathbf{V}\mathbf{L}\mathbf{V}^T\]

C <- cov(X)
eigen(C)
## eigen() decomposition
## $values
## [1] 2.4802416 0.9897652 0.3565632 0.1734301
## 
## $vectors
##            [,1]       [,2]       [,3]        [,4]
## [1,] -0.5358995  0.4181809 -0.3412327  0.64922780
## [2,] -0.5831836  0.1879856 -0.2681484 -0.74340748
## [3,] -0.2781909 -0.8728062 -0.3780158  0.13387773
## [4,] -0.5434321 -0.1673186  0.8177779  0.08902432

Getting the loadings

X <- USArrests
X <- scale(X) ## Standardize
prcomp(X)     ## Perform PCA
## Standard deviations (1, .., p=4):
## [1] 1.5748783 0.9948694 0.5971291 0.4164494
## 
## Rotation (n x k) = (4 x 4):
##                 PC1        PC2        PC3         PC4
## Murder   -0.5358995  0.4181809 -0.3412327  0.64922780
## Assault  -0.5831836  0.1879856 -0.2681484 -0.74340748
## UrbanPop -0.2781909 -0.8728062 -0.3780158  0.13387773
## Rape     -0.5434321 -0.1673186  0.8177779  0.08902432

What next?

## Standard deviations (1, .., p=9):
## [1] 1.9954525 1.0788358 0.8867900 0.6396917 0.4662448 0.3487425 0.3202376
## [8] 0.2513177 0.1671921
## 
## Rotation (n x k) = (9 x 9):
##                      PC1         PC2         PC3         PC4         PC5
## admissionRate -0.1804793 -0.03538274  0.92780641 -0.31894586  0.01308557
## ACTmath        0.4831079  0.03595880  0.05275053 -0.02202215 -0.20376965
## ACTenglish     0.4258063  0.08076983  0.07204367 -0.09362101 -0.12816934
## undergrads     0.1336508 -0.61270224  0.19275286  0.49993205 -0.48714672
## cost           0.2799760  0.40436858 -0.03533647 -0.34856772 -0.50273913
## gradRate       0.4027967  0.14617155  0.19023908  0.38344935  0.26046245
## FYretention    0.4221292  0.10774438  0.16251349  0.14930097  0.51048338
## fedloan       -0.3240619  0.44623432  0.11016604  0.49268655 -0.01230583
## debt          -0.1049739  0.46894908  0.13439471  0.32485065 -0.35104894
##                       PC6         PC7          PC8         PC9
## admissionRate -0.04215275  0.01827869  0.009742059 -0.03531274
## ACTmath       -0.40686339 -0.30828053 -0.028677929 -0.67758906
## ACTenglish    -0.36307868 -0.16340715 -0.424831305  0.66541190
## undergrads     0.24316410  0.09309020 -0.118037660  0.02466435
## cost           0.40893056  0.45564041 -0.053956579 -0.06972096
## gradRate      -0.24096830  0.50906734  0.485465178  0.11149885
## FYretention    0.60484531 -0.25016313 -0.257837343 -0.07748462
## fedloan       -0.16700746  0.22824611 -0.569676237 -0.19055711
## debt           0.15070219 -0.53649390  0.418412811  0.19140498

What next?